Skip to content

sort:Optimize sort collation for long lines#12144

Open
mattsu2020 wants to merge 2 commits intouutils:mainfrom
mattsu2020:fix_sort_performance
Open

sort:Optimize sort collation for long lines#12144
mattsu2020 wants to merge 2 commits intouutils:mainfrom
mattsu2020:fix_sort_performance

Conversation

@mattsu2020
Copy link
Copy Markdown
Contributor

What changed

  • Avoid precomputing ICU collation sort keys for lines larger than 1 MiB.
  • Store optional collation key ranges so very long lines can fall back to lazy locale comparison during sorting.

Why

Fixes #12138. In UTF-8 locales, sort precomputed ICU collation keys for every input line. For inputs with a small number of very large lines, such as 26 lines of 200 MiB each, the cost of generating and storing multi-GiB collation keys dominated runtime.

Impact

Small and normal-sized lines keep the existing precomputed-key fast path. Very long lines skip the expensive key materialization and use locale_cmp when compared.

Validation

  • cargo check -p uu_sort
  • cargo test -p uu_sort
  • cargo test -p coreutils --test tests test_sort::test_default_unsorted_ints -- --exact
  • Compared output against GNU sort with cmp for 52 MiB and 130 MiB reproducer inputs.
  • Hyperfine on the issue-sized 5.1 GiB input with LC_ALL=en_US.UTF-8 --parallel 1 --buffer-size 8G:
    • uutils release: 5.054 s
    • GNU gsort 9.11: 33.685 s

@mattsu2020 mattsu2020 changed the title [codex] Optimize sort collation for long lines sort:Optimize sort collation for long lines May 4, 2026
@mattsu2020 mattsu2020 marked this pull request as ready for review May 4, 2026 13:00
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/follow-name (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tail/tail-n0f (passes in this run but fails in the 'main' branch)

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 4, 2026

Merging this PR will degrade performance by 23.24%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

❌ 2 regressed benchmarks
✅ 315 untouched benchmarks
⏩ 46 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation sort_key_field[500000] 767.8 ms 804.6 ms -4.57%
Memory sort_german_de_locale 3.3 MB 4.3 MB -23.24%

Comparing mattsu2020:fix_sort_performance (8c85e7f) with main (485b156)

Open in CodSpeed

Footnotes

  1. 46 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@xtqqczze
Copy link
Copy Markdown
Contributor

xtqqczze commented May 4, 2026

Out of interest, why choose 1 MiB as the limit, rather than something lower like u16::MAX?

@mattsu2020
Copy link
Copy Markdown
Contributor Author

Out of interest, why choose 1 MiB as the limit, rather than something lower like u16::MAX?

Since measurements using 64 KiB showed performance that was at least equivalent for the issue workload, we will change the threshold to u16::MAX.

@xtqqczze
Copy link
Copy Markdown
Contributor

xtqqczze commented May 4, 2026

@mattsu2020 Could you also add a benchmark (in separate PR)?

@mattsu2020
Copy link
Copy Markdown
Contributor Author

@mattsu2020 Could you also add a benchmark (in separate PR)?

Sure, I’ll keep this PR focused on the fix and open a separate PR adding a benchmark for long-line locale collation.

@sylvestre sylvestre force-pushed the fix_sort_performance branch from 23e4bb3 to 8c85e7f Compare May 8, 2026 05:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Case where GNU sort is 40 times faster than uutils

2 participants